Scikit-Learn 패키지는 분류(classification) 모형의 테스트를 위해 make_classification
와 make_blob
라는 가상 데이터 생성 함수를 제공한다.
make_classification
함수의 인수와 반환값은 다음과 같다.
X, y = make_classification(n_samples=100, n_features=20, n_informative=2, n_redundant=2,
n_repeated=0, n_classes=2, n_clusters_per_class=2,
weights=None, flip_y=0.01, class_sep=1.0, hypercube=True,
shift=0.0, scale=1.0, shuffle=True, random_state=None)
인수:
반환값:
In [1]:
from sklearn.datasets import make_classification
In [2]:
X, y = make_classification(n_features=1, n_redundant=0, n_informative=1, n_clusters_per_class=1, random_state=4)
plt.scatter(X, y, marker='o', c=y, s=100)
plt.show()
In [3]:
plt.title("One informative feature, one cluster per class")
X, y = make_classification(n_features=2, n_redundant=0, n_informative=1, n_clusters_per_class=1, random_state=4)
plt.scatter(X[:, 0], X[:, 1], marker='o', c=y, s=100)
plt.show()
In [4]:
plt.title("Two informative features, one cluster per class")
X, y = make_classification(n_features=2, n_redundant=0, n_informative=2, n_clusters_per_class=1, random_state=6)
plt.scatter(X[:, 0], X[:, 1], marker='o', c=y, s=100)
plt.show()
In [6]:
plt.title("Two informative features, one cluster per class, different weight")
X, y = make_classification(n_features=2, n_redundant=0, n_informative=2, n_clusters_per_class=1, weights=[0.9, 0.1], random_state=6)
plt.scatter(X[:, 0], X[:, 1], marker='o', c=y, s=100)
plt.show()
In [7]:
plt.title("Two informative features, two clusters per class")
X2, Y2 = make_classification(n_features=2, n_redundant=0, n_informative=2, random_state=2)
plt.scatter(X2[:, 0], X2[:, 1], marker='o', c=Y2, s=100)
plt.show()
In [8]:
plt.title("Multi-class, two informative features, one cluster")
X, y = make_classification(n_features=2, n_redundant=0, n_informative=2, n_clusters_per_class=1, n_classes=3)
plt.scatter(X[:, 0], X[:, 1], marker='o', c=y, s=100)
plt.show()
make_blobs
함수의 인수와 반환값은 다음과 같다.
X, y = make_blobs(n_samples=100, n_features=2, centers=3,
cluster_std=1.0, center_box=(-10.0, 10.0),
shuffle=True, random_state=None)
인수:
반환값:
In [9]:
from sklearn.datasets import make_blobs
In [10]:
plt.title("Three blobs")
X1, Y1 = make_blobs(n_features=2, centers=3)
plt.scatter(X1[:, 0], X1[:, 1], marker='o', c=Y1, s=100)
plt.show()